Skip to content

feat: Added custom document metadata support#2766

Open
StoreksFeed wants to merge 12 commits into
HKUDS:mainfrom
StoreksFeed:advanced-metadata-support
Open

feat: Added custom document metadata support#2766
StoreksFeed wants to merge 12 commits into
HKUDS:mainfrom
StoreksFeed:advanced-metadata-support

Conversation

@StoreksFeed
Copy link
Copy Markdown
Contributor

Description

This PR adds custom document metadata support across backend, API and WebUI. Users can now attach arbitrary key-value metadata to documents during insertion, update them and retrieve documents with metadata during queries.

Related Issues

None

Changes Made

Backend (lightrag/)

  • base.py: Added include_metadata field to QueryParam class
  • lightrag.py:
    • Updated ainsert() to accept and process metadata parameter
    • Updated aquery() to support metadata via QueryParam
  • operate.py: Integrated metadata into document chunking and storage
  • utils.py: Added metadata serialization/deserialization utilities and validation helpers

API (lightrag/api/routers/)

  • document_routes.py:
    • Updated /documents/text and /documents/file endpoints to accept metadata
    • Added /documents/{document_id}/metadata endpoint for updating metadata via PATCH method
  • query_routes.py:
    • Added metadata support to /query endpoint via include_metadata flag
    • Updated query response to include document metadata when requested

WebUI (lightrag_webui/)

  • api/lightrag.ts: Added TypeScript API client methods for metadata operations
  • components/documents/MetadataEditorDialog.tsx: New component for editing document metadata
  • features/DocumentManager.tsx: Integrated metadata editor dialog into document management UI
  • locales/en.json: Added strings for metadata features

Checklist

  • Changes tested locally (see notes)
  • Code reviewed
  • Documentation updated (see notes)
  • Unit tests added (if applicable)

Additional Notes

Implementation:

  • Custom metadata storage leverages the existing metadata logic used for system fields (is_duplicate, original_doc_id, original_track_id, processing_start_time, processing_end_time)
  • System-reserved fields are protected: viewable but not editable through WebUI (indicated by lock icon) or API (returns error)
  • Metadata handling logic has been updated to merge custom fields provided during document insertion with system fields added through document processing pipeline

Testing:

  • Insertion with metadata (for /documents/text endpoint) and metadata update operations have been tested
  • Querying remains UNTESTED due to need of embedding model and LLM set up

Documentation:

  • API documentation has been updated within the WebUI interface
  • Additional markdown documentation for the metadata feature may be added later

@StoreksFeed
Copy link
Copy Markdown
Contributor Author

Some of the unit tests seem to be failing due to some recent changes in f2c0aa8ac7a7e9af86f888691d20f346bba57c1d?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant